J • gen. Virol. (1987), 68, 1883-1890, Printed in Great Britain 
Key words: TGEV/peplomer protein E2/gene sequence 


1883 


The Predicted Primary Structure of the Pepiomer Protein E2 of the 
Porcine Coronavirus Transmissible Gastroenteritis Virus 

By DENIS RASSCHAERT and HUBERT LAUDE* 

Institut National de la Recherche Agronomique, Station de Recherches de Virologie et 

d’Immunologic, F -7 '8850 Thiverval-Grignon, France 

{Accepted 31 March 1987) 


SUMMARY 

The complete nucleotide sequence of cloned cDNAs containing the E2 glycoprotein¬ 
encoding region of the genome of transmissible gastroenteritis virus (TGEV) has been 
determined, A single large translatable frame of 4-3 kb starting at 8-2 kb from the 3' end 
of the genome was identified. Its deduced amino acid sequence contains the 
characteristic features of a coronavirus pepiomer protein: (i) the precursor polypeptide 
of TGEV E2 is 1447 residues long (i.e, 285 longer than the avian infectious bronchitis 
coronavirus spike protein); (ii) partial N-terminal sequencing demonstrated that a 
putative secretory signal sequence of 16 amino acids is absent in the virion-associated 
protein; (iii) the predicted mol. wt. of the apoprotein is 158K; most of the 32 potential 
V-glycosylation sites available in the sequence are presumed to be functional to account 
for the difference between this and the experimentally determined value (200K to 
220K); (iv) a typical hydrophobic sequence near the C terminus is likely to be 
responsible for anchoring the pepiomer to the virion envelope. 


INTRODUCTION 

Transmissible gastroenteritis virus (TGEV), a highly enteropathogenic virus of pigs, belongs 
to the family Coronaviridae, a group of enveloped viruses with a large, positive-stranded RNA 
genome (see Siddell et ai , 1983). Studies on the organization and expression of the TGEV 
genome (Dennis & Brian, 1982; Hu et ai , 1984; Jacobs et ai, 1986; Kapke & Brian, 1986; 
Rasschaert et ai , 1987) tend to confirm the findings reported for two other members of the 
Coronaviridae, murine hepatitis (MH) and avian infectious bronchitis (IB) viruses (see Laude et 
ai, 1987). Three functional classes of polypeptides have been identified in TGEV virions: a 
nucleocapsid protein, a matrix protein and a pepiomer protein forming the characteristic 
surface projections (Garwes et ai , 1976). The pepiomer protein E2, a highly glycosylated 
polypeptide of 200K to 220K, has been shown to elicit the production of neutralizing antibodies 
(Laude et ai, 1986; Jimenez et al. , 1986; Garwes et ai , 1987) which are able to confer protection 
on suckling piglets (Garwes et al ., 1978/79). At least four main antigenic sites have been defined 
on the E2 protein by means of topographical and functional mapping with monoclonal antibody 
probes (Delmas et ai , 1986). Most of the neutralization-mediating determinants appeared to be 
grouped in two related sites, both conserved between virus strains. The majority of the epitopes 
critical in neutralization appeared to be sensitive to denaturation (Jimenez et al., 1986; Garwes 
et ai , 1987). In addition, it is anticipated that the pepiomer bears virulence-modulating 
determinants, as has been suggested in the case of MHV-JHM (Fleming et ai, 1986). 

Substantial information on pepiomer functional organization has been accumulated for other 
coronaviruses. The spike protein of IBV comprises two or three copies each of two glycopeptides 
SI (90K) and S2 (84K). S2 anchors the pepiomer to the viral envelope through a short C-terminal 
hydrophobic domain and SI is non-covalently attached to S2 (Cavanagh, 1983; Binns et ai, 
1985). Neutralizing and haemagglutinating antibodies bind to the SI subunit (Mockett et ai, 
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1984). Removal of SI abolished infectivity but not attachment to cells (Cavanagh & Davis, 
1986). Recently, comparison of the amino acid sequences of two IBV strains, Beaudette (Binns 
et ah , 1985) and M41 (Niesters et ah , 1986), led to the proposal that two candidates for 
neutralization epitopes are located near the SI N terminus (Niesters et ah , 1986). In MHV, the 
180K E2 protein is cleavable by host cell proteases or by trypsin to form two comigrating 
products of 90K. Palmitic acid has been found to be covalently attached to one of the 90K 
species, probably defining the membrane-anchored subunit. Proteolytic cleavage of E2 may be 
required for membrane fusion activities (Sturman et ah , 1985; Frana et ah, 1985). Three 
independent studies on MHV (Talbot & Buchmeier, 1985) and TGEV (Delmas et ah , 1986; 
Jimenez et ah , 1986) characterized an epitope resistant to antibody selection which might be 
essential for productive infection. 

In this paper we present the complete sequence of the E2 gene of TGEV. The main features 
predicted from the primary structure of the encoded protein are described and compared with 
those previously reported for the IBV spike protein. 

METHODS 

cDNA cloning and sequencing . The strategy and protocol were as reported previously (Laude et ah , 1987). Briefly, 
purified genomic RNA was copied by reverse transcriptase using either oligo-d(T) 12 -i8 or a specific 30-mer 
primer (pE2: 5' CATCATCCTTAACAAAATTCTCTAGCAGAA). RNase T2-treated cDNA-RNA hybrids 
were dC-tailed and inserted in /'s/I-cut dG-tailed pBR322. Transfection of Escherichia coli RR1 and selection of 
recombinant clones were performed following standard methods. ‘Shotgun’ DNA sequencing by Sanger’s chain 
termination method and sequence analysis were accomplished as described previously (Laude et al , 1987); part of 
the 6.47 clone was sequenced using a 15-mer oligonucleotide (p47) instead of the M13mpl8 universal primer. 
Synthetic oligonucleotides were obtained by the beta amidite method using a Biosearch 8600 DNA synthesizer. 
DNA has been sequenced at least twice on each strand. 

N’terminal sequencing of protein E2. Virion polypeptide E2 resolved by SDS-PAGE was purified by 
electroelution as described (Laude et ah, 1987), and about lOOpmol were analysed in a ‘gas phase’ Applied 
Biosystems 470A apparatus. 


RESULTS 

The coordinates of the four sequenced cDNA clones on the restriction map of the genome are 
given in Fig. 1. (ThepTG clones 6.3 and 6.47 were derived using pE2 as a primer.) In Northern 
blot analyses, the pTG2.26 insert was shown to contain sequences that hybridized only with the 
two largest RNA species detected in TGEV-infected cells: the genomic RNA and subgenomic 
RNA 2 (data not shown). DNA sequencing led to the identification of a 4341 base open reading 
frame (ORF), which was an obvious candidate for the E2 gene. The sequences encompassing the 
E2 ORF are presented in Fig. 2. A characteristic feature is that it is flanked by an identical 
sequence 5' ACTAAACTT 3' at each end. A homologous sequence has been identified within 
each intergenic junction (Rasschaert et ah , 1987). The first consensus sequence is located 25 
bases upstream from the potential ATG initiation codon, and maps at 8-25 kb from the 3' end of 
the genome, which is in agreement with the size estimates for RNA 2 (Hu et ah , 1984; Jacobs et 
ah, 1986; Rasschaert et ah, 1987). The second consensus sequence is followed 22 bases 
downstream by an ORF putatively located at the 5' end of RNA 3 (partly shown in Fig. 2). The 
deduced sequence of the 1447 amino acid primary translation product and its hydrophilicity 
profile are shown in Fig. 2 and 3 respectively. A hydrophobic stretch with the characteristics 
of an eukaryotic signal peptide is predicted at the N terminus (Von Heijne, 1986). Indeed, 
partial N-terminal microsequencing demonstrated that the first 16 residues were absent 
from the virion-associated E2 protein, as its N-terminal sequence was found to be 
XXFPCSKLTXRTIGNQ, Accordingly, the mature product would be 1431 amino acids long. It 
has a predicted mol. wt. of 158316, comprising 126 acidic and 91 basic residues. There are 33*5% 
hydrophobic residues. Thirty-two sites for V-glycosylation (Asn-X-Ser or Asn-X-Thr) occur in 
the sequence, involving as many as 27*5% of the available Asn residues. Most of them are 
associated with a hydrophilic segment of the E2 polypeptide (Fig. 3). 
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Fig. 1. Restriction map of the region of TGEV genome encoding the E2 gene (bar). The positions of the 
four cDNA clones and of the two primers used are shown. 


DISCUSSION 

cDNA copies of the TGEV genome covering the 5' coding region of mRNA 2 were 
sequenced. A single large translation frame was found, yielding a 1447 amino acid product with 
the characteristics of a coronavirus peplomer glycoprotein. The other identified ORFs did not 
exceed 200 bases and are within the E2 gene (not shown). The first ATG of the E2 ORE is 
positioned 24 bases downstream from a consensus sequence which is assumed to be the start of 
the mRNA 2 transcript (Fig, 2). The sequence upstream from the ATG codon (C ACC ATG A) is 
optimal for initiation by eukaryotic ribosomes [(CC)ACCATGG; Kozak, 1986]. Moreover the 
first Met is followed by a leader sequence that has been shown to be removed from the mature 
protein. The deduced cleavage site for signal peptidase is located after the 16th residue, between 
Gly and Asp. Inspection of the nucleotide data revealed the occurrence, 120 bases downstream, 
of an additional consensus sequence, TTCTAAACTA, which could function in the initiation of 
transcription. However, the next ATG in frame with the E2 ORF occurs only at position 520 
and is not followed by a peptide sequence likely to translocate E2 into the membrane. 
Comparison of our results with previously reported partial nucleic acid data indicates a 
discrepancy at the 3' end of the E2 ORF, where the ORF is 3*9 kb long with GCCATGA at the 
y terminus (Hu et aL> 1984). Our data prove that the sequence GCCTAGA occurs instead, at 
3*8 kb from the initiation codon. Hence, the ORF extends up to a double stop sequence 
CCATTAAATTTAA occurring at 4*3 kb, and thereby includes the sequences predicting the 
anchor structure of the protein (see below). 

The deduced mol. wt. of the virion-associated E2 is 158K (aproprotein), a value in close 
agreement with the M x 160K determined for the TGEV E2 unglycosylated form in tunicamycin- 
treated cells (Jacobs et al ., 1986; B. Delmas & H. Laude, unpublished results). The 130K M v 
species detected by translation of mRNA 2 in reticulocyte lysates might thus correspond to an 
incomplete translation product (Jacobs et al ., 1986). In the mature polypeptide, the 
carbohydrate moiety should approach 27 % of the total weight, implying that a large proportion 
of the 32 potential sites for iV-glycosylation are functional. An equivalent high sugar content has 
been reported for the IBV spike protein (Binns et al ., 1985). 

The hydrophilicity profile predicts that E2 is hydrophobic overall (Fig. 3), reflecting the 
spatial importance of the tightly packed core in the peplomer. The amino half of the E2 chain 
shows few highly hydrophilic (virtually exposed) segments, whereas several prominent peaks 
are visible in the carboxy half. Examination of the sequence near the C terminus reveals the 
presence of a highly hydrophobic segment, comprising 45 unpolar residues, including 11 
cysteines (Fig. 2). A similar structure has been previously noted in the IBV spike protein (44 
hydrophobic residues including six Cys; Binns et al ., 1985). Such a high ratio of cysteine 
residues (24-5% as compared to 3-4% for the whole molecule) in the presumptive anchor region 
of the peplomer seems so far to be a distinctive feature of the coronaviruses. In both the viruses, 
the Cys residues cluster mainly in the carboxy distal half part of the hydrophobic domain. 
Hypothetically, these residues may serve as a site for covalent linkage of fatty acid chains (see 
Schmidt, 1983), as one E2 subunit of MHV has been reported to be acylated (Sturman et aL , 
1985). Moreover, an eight residue segment, KWPWYVWL, probably corresponding to the site 
of entry into the membrane, is perfectly conserved in TGEV and IBV (Fig. 2). In both cases, this 
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6ACTATST AATTflftABAAflftflBAATTBftflTSflftflTG6TCAT6fiATTftCTflftG6flftB6STflfl6TT6CTCATT AfiflAAT AAT B6 T ftA6 T,TACTflftflCT T f TS 61AACACTTC6TT AflCftCftCC 

. . 30 . 60 i 9v . . 12o 

flT6AAAAAACTATTT6T66TTTTBBTCGTAflTSCCATTSATTTAT6fiAGACMTTTTCCn6TTCT«ATTfiACTMTA6AACIATA66CAACCAGT66fiA7LTCATT6AftfiCCTTCCjT 
H < K L F 0 V [ 0 V H P l I j G DNFPC5KLTNRT1GNQHNLIETFL 40 

.150 , .180 , . 210 . . 240 

^ T ^ C J? fl6TAGTA66TTACCACCTAATTCAGfiTSATSTGTTASGT6ATTflTTTTC CTACTGTACflACCTTGGTTTAATTBCATTC6CfiflTAfiTAGTflATSACCUTATGTTftCflCTS 
L S SRLPPNSDDV16DYFPTVGPHFNCIR H H 5 N D L 1 V T L 80 

. . 270 . ,300 . .330 . . 36V 

GAAAATCTTAAAGCATTGTATTG6GATTA7GCTACAGAAAATATCACTTGGAATCACAGACAACGGTTAAAC6TAGTCGTTAATSGATACCCAlACTCCATCACA8TTACAACAACrC3C 
5 N L K A l Y N 0 Y A T E M I T M « H R Q R L K V V V M 6 Y P Y S 1 T V T 7 ' R 120 

. . 390 . .420 . .450 . . 480 

AATTTTAATTCT6CT6AA6GTGCTATTATATGCATTTGTAAG66CTCACCACCTACTACCACCACAGAATCTAGTTTGACT7GCAATTGGG6TAGTGAGTGCAGGTTAAACCATAAGTTC 
NFNSAEGAIICICKSSPPTTTTESSLTCNIGSECRLNHKF 160 

. . 510 . .540 . .570 . . 600 

CCTATAT6TCCTTCTAATTCA6A6GCAAATT6TG6TAATA1GCTGTAT6GCCTACAAT6GTTT6CA6AT6AGGTT6TTGCTTATTTACATG6TGCTAGTTACCGTATTAGTTTT6AAAAT 
P 1 C P S N 5 E A N C G N H L V 6 L Q H F A D E 0 V A Y L H G A S Y R I S F E N 200 

. . 630 . .660 . .690 . . 720 

CAATGGTCTGGCACTGTCACATTTGGTGATATGCGTGCGACAACATTAGAAGTCGCTGGCACGCTTGrAGACCTTIGGTGGTTTAATCCTGITTATSATGICAGTIATTATAGGGTTAAT 
Q«S6TVTFGDMRATIIEVAGTIVDLWWFNPVYDV$YYRVN 240 

750 . . 7B0 . .810 . . 840 

AATAAAAATGGTACTACCGTAGTTTCCAATTGCACTGATCAATGTGCTAGTTATGTGGCTAATGTTTTTACTACACAGCCAGGAGGTTTTATACCATCAGATTTTAGTTTTAATAATTGG 
N K N 6 T I V V S N C T D Q C A S Y V A N V F T T 8 F 6 G F ! P S D F S F N N M 280 

. . 870 . .900 . .930 , . 960 

ttccttctaactaatagctccacgttggttagtggtaaattagttaccaaacagccgttattagttaattgcttatggccagtccctagctttgaagaagcagcttctacattttgtttt 

F L L T H S S 1 L V S G K L_V T k fi P L L V N C L N P V P S F E E A A S T F C F 320 

990 . . 1020 . . 1050 . . 1080 

GA6GGTGCTGGCTTTGATCAATGTAATG6TSCTGTTTTAAATAATACTGTAGACGTCATTA6GUCAACCTTAATTT7ACTACAAAT8TACAATCA66TAAGSG7GCCACA6TGTTTTCA 
l G A G F o G C N G A v L N N T 0 D V 1 R F N L HI F 1 T N 0 0 S G K G A T 0 F S 360 

. . 1110 . . 1140 . . 1170 . . 1200 

TTGAACACAAC5GGTGGTGTCACTCTTGAAATTTCATGTTATACAGT6AGTSACTCGAGCTTTTTCA6TTACGGTGAAATTCCGTTCSGCGTAACTGATGGACCACGGTACTGTTACGTA 
L N I T GGVTLEISCYTVSDSSFF5YBEIPFGV7DGPRYCYV 400 

. . 1230 . . 1260 . . 1290 . . 1320 

CACTATAATGGCACAGCTCTTAAGTATTTAGGAACATTACCACCTAGTGTCAA6GAGATTSCTATTAGTAAGTG6GGCCATTTTTATATTAATGGTTACAATTTCTTTAGCACATTTCCT 
H Y N 6 T ALKYL6TLPPSVKE Ifl 1 5KWGHFY J NSYNFFSTFP 440 

1350 . . 1380 . . 1410 . . 1440 

ATT6ATT6TATATCTTTTAATTTGACCACTGGTGATAGTGAC6TTTTCTGGACAATAGCTTACACATCGTACACTSAAGCATTAGTACAAGTTGAAAACACAGCTATTACAAAG6TGACG 
I D C I S F N L T T G D S D V F U T I A Y T S Y T E A L V Q V E N T A I T K V T 480 

1470 , . 1500 . . 1530 . . 1560 

TATT5TAATASTCAC6TTAATAACATTftAAT6CTCTCAAATTACT6CrAATTT6AATAAT 6GATTTTATCCT GTTTCTTCAAGT6AA6TTSGTCTTGTCAATAA6A6TfiTTST6TTACTA 
YCNSHVNfilKCSQlTANLNN |T F~ Y P] VSSSEVSLVNkSVVLL 520 

1590 . . 1620 . . 1650 . . 1680 

CCTAGCTTTTACACACATACCATT6TTAACATAACTATT6GTCTTGGTATGAAGCGTAGTGGTTATGGTCAACCCATAGCCTCAACATTAAGTAACATCACACTACCAAT6CAGGATCAC 
P S F Y T H T 1 V N I T IGLGHKRS6YGQP1ASTLSNITLPHQDH 560 

1710 . . 1740 . . 1770 . . 1800 

AACACCGATGTGTACTGTATTCGTTCTGACCAATTTTCAGTTTATGTTCATTCTACTTGCAAAAG1GCTTTATGG6ACAATATTTTTAAGCGAAACTGCACGGACGTTTTAGATGCCACA 
KTDVYCIRSOQFSVrVHSrCKSALHDNlFKRNCTOVtDAT 600 

1830 , . 186v , . 1890 . . 1920 

BCTGTTATAAAAACTGGTACTTGTCCTTTCTCATTTGATAAATTGAACAATTACTTAACTrTTAACAAGTTCTGTTTGTCGTTGAGTCCTGTTGGrGCTAATTGTAAGTTTGATGTAGCT 
AVIkTGTCPFSFDK, LNNYlTFNk FCLSISPVGANCKFDOA 640 

1950 . . 1980 . . 2010 . . 2040 

6CCCGTACAAGAACCAAT6AGCAGGTTGTTAGAAGTTTGTATGTAATATATGAAGAAGGAGACAACATAGTGGGTGTACCGTCTGATAATAGTG6TGT6CACGATTTGTCAGTGCTACAC 
ARTRTMEQVVRSLYV I YEE73DN IVSVPSONSSVHSLSVLH 680 

2070 . . 2100 . . 2130 . . 2160 

CTA6ATTCCTSCACA 6ATTACAATATATATGGTA6AACT6G T6T T6GTATTA TTAGACAAACTAACAS6ACGCTAATTA6TGGCTTATATTACACATCACTATCA66TGATTTGTTAG6T 
l D S C T |D Y N I Y S R T 6 ~ V 6 1 | 1RQTNRTL1S6LYYTSLS6DLLG 720 

2190 . . 2220 . . 2250 , . 2280 

TTTAAAAATGTTAGTGATGGTGTCATTTACTCTGTAACGCCATGTGAT6TAAGCGCACAAGCAGCTGTTATT6ATGGTACCATAGTTGGGGCTATCACTTCCATTAACAGTGAACTG7TA 
F K N V S D G V I Y S 0 T P C D V S A Q A A 0 I D G T I V G A I T S I N S E L L 760 

2310 . . 2340 . . 2370 . . 2400 

GGTCTAACACATTSSACAACAACACCTAATinTATTACIACTCTATATATAATIACACAAATGATAGGACTCGTSSCACTSCAATTGACASTAATGATaTTGATTGIGAACCTGTCATA 
G L T H U T T T P N F Y Y Y S I Y N Y T N j D R f R *6 i T A I D S N D 0 0 C E P V I 


800 
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2430 . . 2460 « , 2490 . . 2520 
ACCTATTClAflCAlfl66161T16TAAAAATfiBT6CTTTTGTTTTTflTTAM:5TCflCACflTTCT6ATG6ASAC6TGCAACCAflTTASCACT66TAAT6?CftC8ATACC?AGAAftCTTTACC 
TYSNIGVCKN6fiFVFINVTHSDGDVQPISTGNVTIPTNFT 


2550 . . 2500 . . 2610 . , 2640 

ATATCC6T6CAA6TC6AATATATTCA66TTTACACTACACCA6T6TCAATA6AGT6TTCAA6ATAT6TTTGTAATGBTAACCCTA66TGTAACAAArT6TTAACACAATACfiTTTCr6CA 

VTTPVSIDCS RfTTTlN GNPRCNKLLTQVVSA 


I S V 9 V 


E Y I 9 


2670 . . 2700 , . 2730 . . 2760 
TG7CAAACTATTGAGCAAGCACTTGCAATGGGTGCCAGACTT6AAAACATBGAGGTTGATTCCATGTTGTTT6TTTCTGAAAATGCCCTTAAATT6GCATCTGTTGAAGCATTCAATAGT 
CQTIEQALAHGARLENMEVDSflLFVSENALKLASVEAFNS 


2790 . . 2020 . . 2850 . . 2080 

TCAGAAACT T TA6ACCCTAT T TACftAASAATSfiCCTAAT ATftGST6GTTCT T6SC TA8AASSTCTAAAATACftTACTICCSTCCCftTAATASCftAftCSTAA6TATCSTTGA6CT AJftGAA 
SETLDPIYKEWPNIGGSWLEGLKYIIP5HNSKRKY [FT ft’ I E 


GACTTGCTTTT 


D L L F 


2910 . , 2940 . . 2970 . . 3000 
GAT AAGGTT6T AACATCT6GTTTAGGTACAGTTGATGAA6ATTATAAACGTTGTACAG6T6GTT ATSACATAGCTSACTTAGTATGTGCTCAATACTATAATG6CATC 
DKVVTSSLSTVDEDYFRCTGGYi/IftDLVCAQY 


Y N G 


I 


3030 . . 3060 . . 3090 . . 3120 

ATGGTGuTACCTGGTGTSGCTAATGCTGACAAAATGACT ftTGTACACAGCATCCCTTGCAGGTSGT ATAACATTAGGT6CACTTGGTGGAGGCGCCGTGGCTATACCTTTTGCAGTAGCA 
G V ft N ft 0 t n T H Y T ft S L A G 6 I T L G A L G 6 G ft V 


MV L P 


A 1 P F ft 


V ft 


G7T 

V 


3150 . . 3100 . . 3210 « . 3240 

CAG6CTAGACTTAATTATGTTSCrCrACAAACTGATGTATTGAACAAAAACCAGCAGArTCTGGCTAGTGCTTTCAATCAAGCTArTGGTAACATTACACAGTCATTT66TAAGGTT 


9 ft R 


INYVftLQTDVLN |K X fl] 9 ! I A S A |F N fl ft I 6| N l T 9 S F G K V 


3270 . . 3300 . . 3330 . . 3360 

AATGATGCTATACATCAAACATCACGAGGTCTTGCTACTGTTGCTAAAGCATTGSCAAAAGTGCAAGATGTTGTCAACATACAAGGGCAAGCTTTAAGCCACCTAACAGTACAATTSCAA 


NDAIHOTSRGLATVflKftLftYVBDVVNlQGOftLSHLTVQLQ 


3390 , . 3420 , . 3450 . . 34BO 

AATAATT TCCAA6CCATTAGTA6JTCT ATTAGTGACATTTATAA1AGGCT TGACGAATTGAGT6CTGATGCACAA6T TGACAG6CT6ATCACAGGAAGACTT ACAGCACTlAATGCATIT 
N [T F Q ~ ft 15 S] 5 ISDIVNRLDELSA B [ft 9 V 0 R l ITT G R L| T ft L N ft F 


. . 351u . . 354'j , , 3570 . . 36 um 

GIGTCTCAGACTCTAACCAGACAAGCGGAGGTIAG6GC TAGT AGACAACTTGCCAAA6ACAA6G T rAA T GAA T GCG T T AGGTC TCAfi TCTCAGAGA T TCGGhTTEIGTGGIAATGGT ACA 


V G [j ] L 1 ft rO~ft El V 9 « 5 R 0 < ^ t 

--- 


■/ 


M E i V R £ » T[ JPFSFi.bNG? 


3o30 




3(»9<.i 


CATTTGTTTTCACTCGCAAAT GCA6CACCAAftTGSCATSA TT TTCTTTCACflCAGTGCTAir ACCAACSSCTTflTGAftACTG TGACTGCTTGGCCAGG 
HLFSLANA 


ft P N G 


(1 1 F F H T V L L P T ft Y E T 


V I ft 


. . 312u 

TA1TTGTGCTTCAGATGGTGAT 
« P G I C ft S D G D 


3750 . . 3780 . . 3810 , . 3840 
CGCACTTTTGGACTTGTCGTTAAAGftTGTCCAGTTGACTTTGTTTCGTAATCTAGATGACAAGTTCTATTTGACCCCCflGAACTATGTATCAGCCTAGAGTTGCAACTAGiTCTGACTTT 
RTFSLVVKDVQLTLFRNLDDFFYlTPRTHYflPRVATSSDF 


3070 . . 3900 . . 3930 . . 3960 
GTTCAAAT T6AAGGGTGCGATGT6CTGTTTGTTAATGCAACTG7AAGTGATTTGCCTAGTATT AT ACCTGATTATATT GATATTAAfCAGACT GTTCAAGACATATTAGAAAATTT TAGA 
VQI EGCDVLFVNA7VS0LPS I IPDT I DINQIVQD 1LENFR 


3990 . , 4020 . , 4050 . . 4080 

CCAAA T T GGACTGT ACCT GAGT TGACATT TGACA T TT TTAACGCAACCT ATT? AAACC TGACTGG T GAAAT TGAT 6AC TTAGAATT T AGG7 CAGAAAASCTACATAACftCCAC T GTAGAA 
P N « T V P E L f F 0 I F « ft T Y L N L 1 6 E I B B L E F R 5 E K L H N T 1 V E 


4110 , . 4140 . . 4t7y . . 4200 
CTTGCCATTCTCATTGACAACATT AACAATACAITA6TCAATCTTGAATGGCTCAATAGAAT FGftAACCTATGT AAAAT6GCCTTGGTAT6TGTGGCTACT AATAGGCTTAGTAGT AATA 
LAI L ION INNTLVNLEWLNR1E 


1 Y V Y N > U 


V N 


L I 6 L V V I 


4230 . . 4260 . . 4290 . . 4320 

rTTTGCATACCATTACTGCTATTTTGCTGTTGTAGTACAGGTTGC rGTSGATGCATAGGTTGTTTAGGAAGTTGTTGTCACTCTATATGTAGTAGAAGACAATfTGAAAATTACGAACCA 
F © I P L L L f ©©© S T G ©© G © I G © L G S ©© N S I © 6RRGFENYEP 


, _ 435iO . , 4380 . . 4410 . . 4440 

ATTGAAftAAGTGCACGT CCAT T AAAITTAAAATGTTAATTCT ATCATCTGCTATAAFAGCAGTTGTTTCTGCT AGAGAAT TTT6TT AAGGhTGATSAAT AAAGTCTTTftAG'AACTAAftCT 
I E i V K 0 K ‘ 


840 


880 


920 


960 


1.000 


1.040 


1.080 


1.120 


1.160 


1.200 


1. 240 


1.280 


1.320 


1.360 


1.400 


1.440 


4470 . . 4500 , , 4530 

^ACGAGTCATTACAGGTCCT&TATGGACftTTSTCAAATCCATTTACACATCCGTAGATGCTGTACTTSACGAACTlGATTGTGCATACTTTGCTGTAACA 

H D I V K S I Y T S V P ft V L D E L D C ft Y F A V T 


Fig. 2. The nucleotide sequence and the predicted amino add sequence of the peplomer protein of the 
Purdue-115 strain of TGEV. The consensus decanucleotides (see text) are boxed. Proximal ATG 
codons are underlined, stop codons are overlined. Amino acids are numbered at the right. Potential sites 
for V-glycosylation (NXT or NXS) are underlined. The potential signal peptide and membrane- 
anchoring domain are indicated by open and closed lines respectively. Homology regions (at least three 
consecutive matches) with the spike sequence of the Beaudette strain of IBV are boxed. 
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Amino acid number 

Fig. 3. Hydrophilicity plot of the TGEV E2 precursor polypeptide. Running average taken over an 
hexapeptide using the hydrophilicity values of Hopp & Woods (1981). Bars in the upper panels indicate 
the A-glycosylation sites; hatched areas represent the signal peptide and the anchoring domain 
respectively; dotted area indicates a predicted long amphipathic a-helix; the relative positions of the 
IBV spike protein signal (V) and connecting (▼) peptides are indicated. 


abed e f g 
Hydrophobic residues 5/7 7/7 3/7 3/7 6/7 2/7 0/7 
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Fig. 4. Search for a stable elongated structure in the TGEV peplomer. The amino acids are listed 
horizontally following a heptad pattern (two a-helix turns). Residues in the columns b and e may form 
the interface between the chains in a a-helical coiled-coil structure. 


region of E2 is preceded by a cluster of TV-glycosylation sites, starting at a markedly hydrophilic 
stretch of 20 amino acids (Fig. 3). The C-terminal hydrophilic segment of TGEV E2 (16 amino 
acids) is significantly shorter than that reported for both the Beaudette and M41 strains of IBV 
(Binns et a /., 1985; Niesters et aL , 1986). Recently de Groot et al. (1987) have described the 
presence of two heptad repeats in the peplomer proteins of MHV, IBV and feline infectious 
peritonitis virus, which are indicative of a coiled-coil structure. This structure could provide an 
explanation for the elongated shape of the peplomer. A Fourier transform of the distribution of 
hydrophobic residues in the TGEV E2 chain allowed us to characterize a segment of about 55 
residues having a strong propensity to form an amphipathic structure with dominant periodicity 
of 100° + 20° (De Lisi & Berzofsky, 1985). This segment is located in a region of E2 which is 
devoid of both Pro and Cys residues (1037 to 1184). In addition, few aromatic residues are 
present in the heptapeptide repeat (Fig. 4). This predicts an 8 nm long a-helical, possibly coiled- 
coil segment. 

Three other features were noted while aligning optimally the TGEV and IBV E2 protein 
sequences (not shown). First, the overall highest homology obtained by Dayhoff's alignment is 
32*3% (with 12*5% residues unmatched) which is consistent with the fact that the two viruses 
belong to separate antigenic groups. Most of the stringent homology regions (boxed in Fig. 2) 
cluster in the carboxy halves of the molecules. In particular, a hydrophilic stretch of 11 residues 
at position 1144 (TGEV precursor) is perfectly conserved. The sequences are markedly 
divergent in the amino part, except for one conserved region at positions 686 to 697. Second, a 
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basic sequence DRTRG occurs in TGEV E2 at position 782, at about the same distance from 
the C terminus as the sequence RRFRR in the IBV spike protein, where the S1/S2 cleavage site 
has been demonstrated (Cavanagh et a /., 1986). Third, the predicted TGEV E2 mature protein 
contains 287 residues more than the IBV protein, a difference expected from the comparison of 
their respective M x values. The characteristic Lys-Val-Thr twofold repeat present in the IBV 
signal peptide is conserved in the TGEV E2 homologous sequence (positions 289 to 296). This, 
along with a tentative alignment of the sequences, suggests that the extra TGEV E2 sequence 
largely protrudes at the NH 2 terminus. Whether or not a specific function is associated with this 
sequence is not clear. 

This paper, together with two papers reporting the sequences of the nucleocapsid N (Kapke & 
Brian, 1986) and the transmembrane El proteins (Laude et al ., 1987), provides a complete set of 
data on the major structural proteins of TGEV. The availability of cloned TGEV peplomer 
sequences, along with a panel of monoclonal antibodies and of neutralization-resistant mutants 
will allow localization of functionally important epitopes. 

We thank J. Gelfi for her intelligent technical assistance, and J. C. Huet and J. C. Pernollet (Laboratoire 
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F. Borras-Cuesta (Laboratoire de Virologie, Thiverval-Grignon), who set up the program for prediction of the 
amphipathic a-helix, is gratefully acknowledged. Thanks are also due to A. Kumar for revising the English 
manuscript. Part of the results were presented at the Third International Coronavirus Symposium (Asilomar, 
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